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Summary 

Within the pattern-mixture modeling framework for informative dropout, conditional linear models 
(CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not 
just at observation times). However, in contrast with selection models, inferences about marginal covari- 
ate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this 
article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects di- 
rectly specified. The association between the binary responses and the dropout time is taken into account 
by modeling the conditional mean of the binary response as well as the dependence between the binary 
responses given the dropout time. Specifically, parameters in both the conditional mean and dependence 
models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout 
time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed 
model using data from a longitudinal study of depression in HIV-infected women, where the strategy of 
sensitivity analysis based on the extrapolation method is also demonstrated. 

Keywords: Bayesian analysis; HIV/AIDS; Marginal model; Missing data; Sensitivity analysis. 

1. Introduction 

Dropout occurs commonly in longitudinal studies. For example, in the HIV Epidemiology Research Study 
(HERS), a HIV cohort study of 1310 women from 1993 to 2000, it was of interest to examine the time 
course of depression (defined as whether the Center for Epidemiologic Studies Depression Scale is equal 
to or greater than 16) in HIV-infected women and other associated factors (Smith and others, 1997; 
Ickovics and others, 2001; Su and Hogan, 2010). At baseline, the HERS women were scheduled to be 
followed up every 6 months for 12 visits. However, the dropout rate in the HERS was appreciable and 
only 173 women had a depression observation at the 12th visit among the 753 women who were HIV- 
infected at baseline and did not die with HIV-related reasons during the study period. Moreover, previous 
studies have suggested that the dropout could be related to the disease progression and associated de- 
pressive symptoms (Ickovics and others, 2001; Roy and Daniels, 2008; Su and Hogan, 2010). As the 
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actual measurement times correspond to assessment dates and vary across women (see Figure 1 of Su and 
Hogan, 2010), following Su and Hogan (2010), in this article the dropout in the HERS is considered to 
occur in continuous time. 

When dropout depends on the unobserved response at the time of dropout, or at future times, even after 
conditioning on the observed data, it is called "informative" or "nonignorable." To deal with informative 
dropout, a variety of model-based approaches, including "selection" models (SMs), "pattern mixture" 
models (PMMs), and "shared parameter" models have been proposed for the joint modeling of the re- 
sponse and dropout processes (Wu and Carroll, 1988; Diggle and Kenward, 1994; Follman and Wu, 1995; 
Ten Have and others, 1998; Wu and Bailey, 1989; Little, 1993, 1994; Hogan and Laird, 1997; Wulfsohn 
and Tsiatis, 1997; Henderson and others, 2000; Tsiatis and Davidian, 2004; Ibrahim and Molenberghs, 
2009). Semiparametric approaches were also proposed to adjust for the dependence of the dropout time 
on the unobserved responses (Rotnitzky and others, 1998; Scharfstein and others, 1999; Lin and Ying, 
2003; Wilkins and Fitzmaurice, 2007). 

Within the PMMs framework, conditional linear models (CLMs) by Wu and Bailey (1989) are a useful 
approach to deal with dropout that can occur at any point in continuous time (not just at observation times). 
However, one disadvantage of CLMs and PMMs compared with SMs is that their parameters usually lack 
a direct interpretation in terms of marginal covariate effects if nonidentity link functions are used in 
the mean structures (Wilkins and Fitzmaurice, 2007; Roy and Daniels, 2008; Su and Hogan, 2010). For 
some scenarios with only treatment groups and measurement times as the covariates, we can obtain the 
marginal summaries for covariate strata by averaging the response distributions over the dropout patterns 
(Fitzmaurice and Laird, 2000; Su and Hogan, 2010). When a number of confounders or quantitative 
covariates are present, a simple summary of the marginal covariate effects might not be immediately 
available in a CLM or PMM. 

To overcome this limitation, several PMMs have been proposed. Building upon log-linear models, 
Wilkins and Fitzmaurice (2006) developed a marginalized PMM for short sequences of binary data, 
where the conditional dependencies among the responses and between the responses and dropout patterns 
are specified separately in addition to the marginal mean model. To avoid the proliferation of nuisance 
parameters in full likelihood approaches, Wilkins and Fitzmaurice (2007) proposed a PMM using the 
semiparametric moment-based approach. Focusing on the scenarios with many unique dropout patterns, 
Roy and Daniels (2008) developed a PMM where the marginal mean follows a generalized linear model 
and the mean conditional on the latent class and random effects is specified separately. However, mainly 
because of the concerns about sample size per dropout pattern and model parsimony, these models may 
not be directly applicable to the situation where measurement times are irregular across individuals and 
dropout can occur at any point in continuous time. 

In this article, within the Bayesian paradigm, we propose a marginalized conditional linear model 
(MCLM) to deal with continuous-time informative dropout for long sequences of binary data when the 
target of inference is the marginal covariate effects. Given the dropout time, models for the mean and 
dependence (including serial dependence and nondiminishing dependence) structures of the binary re- 
sponses are specified separately (Heagerty, 2002; Schildcrout and Heagerty, 2007; Roy and Daniels, 
2008), while parameters in both models are allowed to depend on the dropout time through linear or 
quadratic formulations similarly as in the original CLMs. With marginal covariates effects directly spec- 
ified, we then marginalize the conditional mean over the unspecified dropout time distribution through 
Rubin's Bayesian bootstrap (Rubin, 1981). Following Su and Hogan (2010), we choose to build the 
MCLM within the Bayesian paradigm in order to avoid extra bootstrapping of the continuous dropout 
time for standard error estimation when the delta method fails in nonparametric frequentist approaches 
(Hogan and others, 2004). 

One advantage of PMMs and CLMs over others is that the unidentifiable part of the model for extrap- 
olating missing data can be distinguished from those identifiable from the observed data, which facilitates 
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substantive critique and empirical sensitivity analysis (Little and Wang, 1996; Daniels and Hogan, 2000, 
2008; Rotnitzky and others, 2001). In this article, we will illustrate the unverifiable assumptions in 
the proposed MCLM and demonstrate sensitivity analysis strategies based on the extrapolation method 
(Rizopoulos and others, 2007) using the HERS depression data. 

The remainder of this article is organized as follows. In Section 2, we introduce the model. Compu- 
tational details are provided in Section 3. In Section 4, we apply our methods to the HERS depression 
data and conduct a sensitivity analysis to assess the impact of unverifiable assumptions on the scientific 
conclusions. Conclusions and discussion follow in Section 5. 

2. Model 

Let Di denote the dropout time for the z'th individual (i — 1, . . . , N). At continuous-time points tn, . . . , f,«, 
(ti ni *s D,), we observe the binary responses Y, = (Yn, . . . , F, ni ) T and the n,- x p exogenous covariate 
matrix X,- = (x;i, . . . , x, n| .) T (e.g. external or fixed by study design). When the dropout is informative in 
the sense that it is related to the unobserved responses given the observed data, we need to jointly model 
(Y, , X, , Di). Specifically, building on the marginalized transition and latent variable model (mTLV) by 
Schildcrout and Heagerty (2007) for long series of binary data, we develop an MCLM by allowing the 
conditional mean and dependence given the dropout time as well as the marginal mean to be separately 
specified. Basically, our model formulation involves 4 components: 

(a) Marginal model for the mean of the y'th response, fjq^ — E(Yij\xjj). 

(b) Conditional model for the mean of the jth response given the dropout time (pattern) Dj, /xfj — 
E(Yij\xij,Di). ' ' ' 

(c) Dependence model for the responses given the dropout time D,- , 

E(Yjj | Yij- 1 , . . . , Y t ; i , b, • , X;; , Di), where bj is an individual-level random intercept. 

(d) Marginal model for the dropout time distribution, f(D, |X,). 
To specify (a), we assume that 

g^)=xjjfi, (2.1) 

where g(-) is a link function, j — 1, . . . , n, , and is a p x 1 vector of marginal regression coefficients. 
Both (b) and (c) capture the association between binary responses and the dropout time. In particular, we 
assume that 

g( fl f j ) = S i j+z] j a(D i ), (2.2) 

where z/y is a subset of x ;/ , a(-) is a q x 1 vector of linear or quadratic functions of the dropout time D,-. 
For identifiability, we use a constraint on ct(-) such that a(T) — 0, where T indicates the time for study 
end or the maximum follow-up in the study. Because of the following relationship between (2. 1) and (2.2) 

E(Yij\xij) = ^E(Yij\Xij, Di)f(Di\Xi), 

the dij term is implicitly a function of p, a(-), the parameters for (d) and the covariates X;;. 

Basically, the model in (2.1) is chosen to obtain the desired target of inference: marginal covariate 
effects. The conditional mean model in (2.2) specifies how the response mean for individuals differ by 
their dropout times D, and this is consistent with the specification in the original CLM by Wu and Bailey 
(1989). In other words, we allow the response mean to depend on the dropout process using a paramet- 
ric formulation (e.g. linear or quadratic functions) as in a CLM. It must be recognized that unverifiable 
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assumptions in (b) influence the inferences about the parameters in (a). For example, in the HERS 
example, if z (/ includes the time variable tu and its corresponding coefficient is «(£>,') = 6*o + 9iDj, 
then early dropouts were allowed to have different time slopes of depression compared to later dropouts. 
However, here we assume that the time slope before dropout at D, can be extrapolated to characterize the 
time slope after dropout, where no data after dropout were available to assess the validity of assumption. 
Therefore, sensitivity analysis is required, and we will demonstrate the corresponding strategies using the 
HERS example in Section 4. 

The purpose of (c) is to account for the dependence between binary responses within individuals and 
allow full likelihood-based inference for long series of binary data. Following Schildcrout and Heagerty 
(2007), we consider both serial dependence with a Markov component and nondiminishing dependence 
with a random intercept. Specifically, the mean of F;,-, conditional on its history Yn, . . . , T,/-i, the ran- 
dom intercept bj, the covariates x i; as well as the dropout time Dj is jufj = E(Yy |Fy_i, . . . , F; i , , x, 7 , 
Df) =B(Y ij \Y ij - 1 ,b i ,x ij , Di) and 

logitC4) = Aij + 7ij (Di) ■ Yjj-[ + bi , h ~ N{0, a 2 (Di)}. (2.3) 

Although a logit link function is used here, note that any valid link function can be adopted (Heagerty, 
2002). For simplicity, the dependence of A, 7 , yuiDi), and cr 2 (D,) on x (J is suppressed for now. Given 
bi, the log odds ratio yij(D{) measures the serial dependence between Yjj and the immediate previous 
response F, 7 _i among those who drop out at D,; bi introduces the nondiminishing (long-range) depen- 
dence between responses within individuals. The intercept A,-; is determined such that the conditional 
mean model in (2.2) and the dependence model in (2.3) are simultaneously satisfied (Schildcrout and 
Heagerty, 2007). In other words, A,-/ is the solution to 

E(F, 7 |x, 7 , Di) = Efe. [Ey.^ \bf {logit -1 (Ajj + y ;7 (D,) • T//-i + bi)}]. 

Further, the serial dependence measure y y(D/) and random intercept variance <7 2 (D,) can be modeled 

via 

y ij (D i )=v/J j 4(Di), (2.4) 
log{<T 2 (Di)}=y]y,(Di), (2.5) 

where w, 7 and v, are subsets of x, 7 , <f>(-), and i//( ) are vectors of linear or quadratic functions of the 
dropout time D,-. For example, w,-; can include the gap time between 2 consecutive visits, which accom- 
modates irregular spacing of measurement times, v; can include treatment group membership such that the 
random intercept variance differs by treatment groups, but this treatment effect will vary by the dropout 
time. 

By allowing the dependence parameters to vary by D, in (2.3), our MCLM has a different within- 
individual dependence structure from a CLM that only allows the mean parameters, e.g. in (2.2), to vary 
by Dj . It is well known that with complete data and likelihood-based approaches, properly modeling the 
within-individual dependence structure can affect the variability estimates more than the point estimates 
of the mean parameters (Diggle and others, 2002). However, with missing data, even point estimates can 
be biased if the dependence structure is not carefully modeled (Kurland and Heagerty, 2004; Daniels and 
Hogan, 2008). By including covariates and allowing the dependence on the dropout time in the dependence 
model, we are trying to minimize these biases in our approach. 

Finally, component (d) needs to be specified to complete the joint distribution for (Y; , X; , Di). Basi- 
cally, this can be modeled using any event time distribution, where the dependence on X, can be checked 
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by standard event time regression analysis methods. Here, we adopt a nonparametric approach and allow 
/(D, |X;) to be completely unspecified within the strata of X, . Following Su and Hogan (2010), we use 
Rubin's Bayesian bootstrap (Rubin, 1981) to obtain the posterior of /(£),- |X,-) for the observed dropout 
times (see details in the Supplementary material available at Biostatistics online). 

3. Computational details 

We let 6 denote the set of parameters that characterize the functions a( ) in the conditional mean model 
in (2.2), let X denote the set of parameters that characterize the dependence model in (2.3-2.5), and let it 
index the dropout time distribution /(£),- |X,-; n). The likelihood contribution from the response data of 
the jth individual is 

f(yi\bi,Xi,Di;fi,0, X) 

- f(yn\bi,x n , Di;P, 0, X)f(y i2 \yiubi,Xi2, D t ; 0, 9,X),,,., f(y ini \yin t -i,bi,x in ., D;; fl, d, X) 

7 = 1 

The posterior distribution for the parameters in an MCLM is proportional to 

/V 

Y[{f(y i \b i ,X i ,D i ;fi,0,X)f(b i \X)f(D i \X i ;x)}p(fi,0,XM7c), 
i=l 

where p(-) is a prior density function. We follow the specification of the original PMMs in the Bayesian 
paradigm (Daniels and Hogan, 2008) and assume that the priors for % are independent of the priors for 
(P, 6, X). It follows that n is not a part of the posterior for (/?, 6, X) and the inference for it can be based 
on the marginal likelihood Yli=\ /(A|X;; n). 

We standardize the continuous covariates to have mean 0 and standard deviation 0.5 as recommended 
by Gelman (2008) and assign independent t priors with 7 degrees of freedom and scale 2.5 (Gelman and 
others, 2008) to the elements of /?, 0 as well as those serial dependence parameters within X in (2.4). 
Independent N(Q, 7) priors are used for random intercept variance parameters (at log scale) within X 
in (2.5). The Markov Chain Monte Carlo (MCMC) for posterior sampling is implemented in MATLAB 
(version 7.1) and more details can be found in the Supplementary material available at Biostatistics online. 

4. Example 

As briefly described in Section 1, our goal is to characterize the depression time course for the 753 HERS 
women. We exclude those women who died due to HIV-related reasons during the study period because 
we consider that response-related death mixed with dropout (Kurland and Heagerty, 2005) is another 
problem that needs further research and is beyond the scope of this article. Depression was measured 
using the Center for Epidemiologic Studies Depression Scale (CES-D), which ranges from 0 to 60 with 
larger scores indicating the presence of more symptoms. Following Su and Hogan (2010), we focus on 
the dichotomized CES-D data that commonly define clinically significant depression in HIV research 
(Radloff, 1977; Ickovics and others, 2001; Cook and others, 2004; Leserman, 2008). The analysis of the 
continuous and binary HERS CES-D data using the original PMM approach (i.e. the marginal covariate 
effects are not directly specified) can be found in Sections 4.1 and 4.2 of Su and Hogan (2010). 
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The covariates of interest include baseline characteristics, such as race (Black/White/Latina and others) 
and initial disease stage (defined as whether the baseline CD4 count is > 200), as well as the time variable 
(in the unit of days). Following Gelman (2008), the time variable is standardized to have mean 0 and 
standard deviation 0.5. 

4. 1 Models under comparison 

We fit an mTLV (Schildcrout and Heagerty, 2007) and an MCLM to the HERS depression data. Assuming 
"missingness at random" (MAR) and the prior independence of the parameters in the response model and 
the dropout time distribution, the missingness is ignorable in the mTLV (Little and Rubin, 2002). In both 
models, the marginal mean of depression follows: 

logit(^) = A) + yffi /(Black) + #>/(Latina) + #,/ (baseline CD4 > 200) 

+ /i 4 /(baseline CD4 < 200)*y + #5 /(baseline CD4 < 200)??) 

+ #,/(baseline CD4 > 200)fy + ^/(baseline CD4 > 200)*?-, (4.1) 

where /(■) is the indicator function. The quadratic term of the time variable is included to allow more 
flexibility to characterize the depression time course. 

In the mTLV, no conditional mean model given the dropout time is needed, while the dependence 
structure includes constant first-order serial dependence and a random intercept for nondiminishing de- 
pendence: 

logitCu? ) = Ay + 7 ■ Fy_i + b it b t ~ N(0, a 1 ), log(<7 2 ) = V . 
The conditional mean model in the MCLM is specified as follows: 

logit^.) = Sij + 6i D* I (baseline CD4 > 200) 
+ 0 2 D*I (baseline CD4 < 200)fy 

+ &i D* I (baseline CD4 > 200)fy , (4.2) 

where the standardized dropout time D* = (D, — T)/T is within [—1, 0], and T — 2093 corresponds to 
the maximum follow-up days in the HERS. The choice for covariates here is based on the analysis reported 
in Su and Hogan (2010), where regression coefficients for races were found to be relatively constant over 
the dropout time. Basically, we allow the regression coefficients in (4.2) to vary as linear functions of 
the dropout time, and if women reached maximum follow-up in the HERS, their regression coefficients 
are assumed to be 0 for identifiability purpose because we have specified a separate model (4.1) for 
the marginal mean of depression. Further, both the first-order serial dependence and the nondiminishing 
dependence are assumed to be linearly related to the dropout time as follows: 

logitCuf.) = Ay + yy(A) • Y U -i + b u bi ~ N{0, a 2 (Di)}, (4.3) 

yij{Di) = Xo + XiDi/T, 

\og{o 2 {D i )} = h + ^D l /T. 

Note that if 6>i = 8 2 — 03 — X\ = fo = 0, the MCLM is reduced to the mTLV under MAR. 
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For calculation of the intercept <5;y, we need to obtain the posterior samples of /(£),- |X,-). Initially, 
we use Cox regression analysis methods to check the relationship between the discrete covariates (race, 
baseline CD4 count) and the dropout time distribution. The Whites and Blacks were less likely to drop 
out than the Latinas and other races; the patients with baseline CD4 count > 200 were also less likely to 
drop out. Therefore, we have /(£),- |X,) ^ /(A) in the HERS data and the Bayesian bootstrapping for 
the observed dropout times is conducted within the race and baseline CD4 groups. 

The priors assigned for Po, Pi, Pi, Pi, Pa, Ps, Pb, Pi, di,0j, #3 an d y , Xq, X \ are t priors with 7 degrees 
of freedom and scale 2.5. The N(Q, 7) priors are used for yj, Xi, and A3. For both models, we run 2 MCMC 
chains and check the convergence after 5000-iteration burn-in period using history plots. The computing 
time for the mTLV and MCLM fits of the HERS example (6505 observations) is approximately 2 and 
8 h per 1000 iterations, respectively, on our machine (2.59 GHz CPU, 32 GB RAM). Pooled posterior 
samples of size 10000 are used for inference. 

4.2 Results 

Table 1 presents the results from both the mTLV and the MCLM. In the MCLM, both the conditional mean 
regression coefficients and the dependence parameters indicate some associations with the dropout time. 
Specifically, earlier dropouts are shown to have larger main effect of baseline CD4 count {9\ [posterior 
mean] = -0.22, 95% credible interval (CI) = [-0.77; 0.34]). If their baseline CD4 counts are sC200, 
earlier dropouts had larger time slopes than later dropouts (§2 — —0.46, 95% CI = [—1.67; 0.95]), while if 
their baseline CD4 counts are >200, later dropouts had larger time slopes than earlier dropouts (#3 = 0.20, 
95% CI = [—0.64; 0.93]). In other words, those early dropouts who had severe immunosuppression at 



Table 1. Results from the HERS analysis. The posterior means, standard deviations (SD), and the 95% CI 
are reported for the marginal regression coefficients, conditional mean, and dependence parameters from 

the fitted MCLM and mTLV 



MCLM mTLV 



Parameter 


Mean 


SD 


2.5% 


97.5% 


Mean 


SD 


2.5% 


97.5% 


Ao 


0.28 


0.22 


-0.15 


0.77 


0.32 


0.18 


-0.06 


0.63 


h 


-0.19 


0.13 


-0.45 


0.05 


-0.26 


0.11 


-0.47 


-0.04 


h 


0.37 


0.16 


0.05 


0.71 


0.24 


0.14 


-0.03 


0.53 


Ih 


0.00 


0.21 


-0.37 


0.39 


0.02 


0.18 


-0.29 


0.40 


Pa 


-0.17 


0.21 


-0.62 


0.18 


-0.25 


0.18 


-0.57 


0.09 


Ih 


-0.59 


0.28 


-1.12 


0.01 


-0.66 


0.29 


-1.18 


-0.05 


h 


-0.29 


0.08 


-0.45 


-0.12 


-0.28 


0.04 


-0.37 


-0.20 


Ih 


0.19 


0.10 


0.00 


0.39 


0.24 


0.10 


0.02 


0.40 


(h 


-0.22 


0.28 


-0.77 


0.34 










o 2 


-0.46 


0.68 


-1.67 


0.95 










03 


0.20 


0.39 


-0.64 


0.93 










^0 


0.63 


0.45 


-0.26 


1.53 










h 


0.67 


0.52 


-0.36 


1.70 










y 










1.19 


0.09 


1.02 


1.36 


h 


0.26 


0.21 


-0.17 


0.64 












0.36 


0.25 


-0.10 


0.87 










¥ 










0.55 


0.05 


0.46 


0.66 


a 2 










1.74 


0.09 


1.58 


1.93 
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Days since enrollment 



Days since enrollment 



Fig. 1. Posterior mean estimates of depression prevalence by race and baseline CD4 groups from the mTLV and 
MCLM fits of the HERS depression data. 



baseline (CD4 200) tended to have higher change rates of depression than later dropouts, but for patients 
who had baseline CD4 counts over 200, this pattern was reversed. However, given the fact that women 
with baseline CD4 > 200 were less likely to drop out, the influence of dropout on the binary responses 
is relatively small for them. Finally, the first-order serial dependence and nondiminishing dependence are 
also shown to vary positively with the dropout time (2 1 = 0.67, 95% CI = [-0.36; 1.70]; 2 3 = 0.36,95% 
CI = [—0.10; 0.87]). Overall, compared with the mTLV fit, the MCLM adjusted the marginal depression 
prevalence profiles upward at the later period of followup for the group with baseline CD4 ^ 200 and the 
largest adjustment occurred for the Latina/others group (left panel of Figure 1). On the other hand, the 
marginal depression prevalence profiles for both the White and Latina/others groups were shifted slightly 
if their baseline CD4 counts are >200, but the general time trends remain stable (right panel of Figure 1). 

Recall that when 0i = f? 2 = 03 — X\ — h — 0, the MCLM is reduced to the mTLV under MAR. 
Therefore, if we assume that MAR is violated, the parameters 9\, 62, 03, A1, and A3 will quantify the 
degree to which MAR fails to hold. Since the estimated 95% CIs for all these parameters cover zero, there 
is no strong evidence from the HERS data that the MCLM fit is preferred to the mTLV fit under MAR. The 
goodness of fit of the MCLM was further assessed by posterior predictive checks based on completed-data 
plots obtained by multiple imputation of the missing responses (Gelman and others, 2005; see details in 
the Supplementary material available at Biostatistics online.). 

In summary, we observed that, regardless of their baseline CD4 counts, Latinas and other race groups 
had higher depression prevalence over time as compared with Blacks and Whites. Given their races, 
women with different baseline CD4 counts all had downward trends in depression prevalence over time. 
There is no sufficient evidence from the data to show that these trends differ (see Figure 3). 



4.3 Sensitivity analysis 

In previous section, the mTLV and MCLM appeared to have similar fits to the observed HERS CES- 
D data. However, the assumptions for extrapolating the missing responses given the observed data are 
different in these models. In the mTLV, MAR is assumed such that the conditional distribution of missing 
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1 ■ ■■ ► 

d T 

Fig. 2. Illustration of the unverifiable assumption made in the MCLM: the horizontal axis represents time since 
enrollment, the vertical axis represents the conditional mean of depression at the logit scale, and T represents the 
study end or maximum follow-up. At time d, some participants dropped out of the HERS. Therefore, the depression 
time slope after d is not estimable from the observed data. In the MCLM, the depression time slope before dropout 
is extrapolated to the time slope after dropout (the solid line). In the corresponding sensitivity analysis, we allow the 
time slope after dropout to follow a piecewise linear model (the dashed line). That is, the time slope before dropout is 
not necessarily equal to the time slope after dropout. 



depression responses given the observed data for those who remained in the study at d is the same as 
the corresponding conditional distribution for those who left the study at d (Molenberghs and others, 
1998), i.e. 

f(jij\yn, Jij-i, X;, Uj-i < d < Uj) - f(yij\yu, yy-u X;, Uj < d). 

In the MCLM, we assume that given the dropout time d and the covariates, missing data after dropout 
share the same parameters as observed data before dropout. For example, in the HERS example, it is 
assumed that given their baseline CD4 counts, women with observed dropout at d had the same time 
slope for tjj > d as for f;; ^ d. This is clear from the illustration in Figure 2. The time slope after dropout 
cannot be obtained from the observed data and has to be extrapolated in the MCLM. Both assumptions in 
the mTLV and MCLM cannot be verified from the observed data and sensitivity analysis is required (Little 
and Wang, 1996; Daniels and Hogan, 2000; Rotnitzky and others, 2001; Daniels and Hogan, 2008). 

We demonstrate an example of sensitivity analysis regarding the abovementioned assumption in the 
MCLM. The strategy of sensitivity analysis for the MCLM can be based on the extrapolation method 
(Rizopoulos and others, 2007). Basically, we assume a different time slope for tu > d, i.e. assume a 
continuous piecewise linear model with a change point at d (see Figure 2) . For the group with baseline 
CD4 ^ 200, we assume the conditional mean model as follows: 

logitOtg) = §ij + OiDftjj + coQ(D*)(tij - D,)+ 

where (x) + = x if x > 0 and 0 otherwise, Z), is the observed dropout time standardized to have the same 
scale of tij and ojq(D*) is the change of the slope after dropout that is different across specific dropout 
times. The model for baseline CD4 > 200 is similar but with a>\ (D*) representing the slope change after 
dropout: 

logit(/^.) = S t j + 9 Y D* + 6 3 D*t u + m{D*)(tij - D/)+. 
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In principle, sensitivity analysis should be based on the parameters that cannot be identified by the 
observed data, such as <±>q(D*) and mi(D*). We assume a simple functional form for coq(-) and wi(-): 

coo(D*) = -a 0 D* = -oo(A - T)/T, 

a>i(D;) = -aifl; = - r)/r. 

Thus, when D,- = T is the maximum follow-up, no adjustment is made about the slope after dropout (i.e. 
for study completers), while the slope is adjusted upward by ao or a\ when D, = 0, that is, when the 
participants dropped out after the enrolment visit. For example, when ao — 2 and some HERS women 
with baseline CD4 ^ 200 dropped out the study at 1 year (d — 365), we assume that before dropout their 
time slopes are 9%(d — T)/T — —0.46(365 — 2093)/2093 = 0.38, but their time slopes after dropouts are 
02 - a 0 )(d - T)/T = (-0.46 - 2)(365 - 2093)/2093 = 2.03. 

In Figure 3, we fix the nonidentifiable parameters ao and a\ at various combinations of their values 
and compare the estimated prevalence differences of depression between baseline CD4 groups for White 
women to check their sensitivity to ao and a\. The results for Latinas and Blacks are similar. Estimates 
for the early time period after enrollment are close across all model fits, including the original mTLV 
and MCLM fits. Depending on specific combination of ao and a\, the baseline CD4 group difference 
in depression prevalence is adjusted downward or upward at the later follow-up period. However, the 
pointwise 95% credible bands from the MCLM fit cover all these estimated depression prevalence profiles 
even when we choose ao and ai at relatively large values (i.e. large changes in time slopes after dropout are 
assumed). In practice, caution needs to be taken about how to choose values or assign priors for sensitivity 
parameters. In this particular example, we only showed a simple case by setting them as constants (i.e. 
assign 1-0 point mass prior). Informative priors on sensitivity parameters can also be used based on expert 
opinions and prior elicitation from previous studies (Daniels and Hogan, 2008). 

5. Discussion 

We have proposed a new model for dealing with informative dropout that occurs in continuous time. 
The marginal covariate effects of interest are directly modeled and the relationship between the binary 
responses and the dropout process is specified using linear or quadratic formulations in both conditional 
mean and dependence models. In our Bayesian approach, the continuous dropout time distribution is not 
modeled and its uncertainty is properly taken into account by Bayesian bootstrapping when obtaining 
marginal covariate effects. 

In this article, we focused on the scenario with dropouts only. There were 173 HERS women who 
actually finished 12 scheduled visits. Su and Hogan (2010) distinguished these administratively censored 
patients from dropouts and allowed them to form a separate pattern in their varying coefficient model- 
ing approach to these data. They found that the parameter estimates for responses from these patients 
were similar to those from later dropouts (e.g. those who finished 11 visits). Therefore, for simplicity, in 
the analysis reported in Section 4, we treated the follow-up times of administratively censored patients 
(ranged from 1952 to 2093 days) the same as the dropout times. In practice, distinguishing administrative 
censoring from dropouts might be more important when patients have staggered entry and informative 
dropout is present (Li and Schluchter, 2004). The proposed MCLM can be extended by allowing the 
parameters to depend on administrative censoring times through linear or quadratic functions, but these 
functions are distinct from those for dropout times. 

We have assumed that the relationship between the dropout time and binary responses follows the 
linear or quadratic formulations. Unspecified smooth functions modeled by penalized splines (Ruppert 
and others, 2003) can be used to allow more flexibility for this relationship (Hogan and others, 2004; Su 
and Hogan, 2010). However, we found that the estimation of the dependence parameters is usually less 
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Fig. 3. Sensitivity analysis for the MCLM of the HERS depression data: posterior mean estimates of the preva- 
lence difference of depression between baseline CD4 groups (CD4 > 200 vs. CD4 ^ 200) for White women with 
fixed values for sensitivity parameters oq and a\ compared with the results from the mTLV and MCLM (the results 
for Latinas and Blacks are similar); gray shades represent corresponding pointwise 95% credible bands from the 
MCLM fit. 



stable than for the mean parameters due to the sparsity nature of the binary data. Therefore, incorporating 
unspecified smooth functions in the mean structure of the MLCM is a more practical extension and the 
same penalized spline approach described in Su and Hogan (2010) can be applied straightforwardly. 

Supplementary material 
Supplementary material is available at http://biostatistics.oxfordjournals.org. 
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